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ABSTRACT • ' ' . \/ ^ ^ 4r 

Multiple-choice standardized achievement tests of 

English vocafculdry and reading coniprehension and of mathematics were 
administered to samples of 59 2' grade eight students and 615 grade 
five students. Iwo forms of 6ach test unit were prepared. The control 
groups t^k forms containing items with; four responses, wiile the 
experime^al grcups took forms which had an additional response of I 
don't knowV A few fictitious vocabulary i^ems having no right answers 
were included in each of the English test .units. In grade eight tests 
the mean scores of the control groups were higher. <^Por the grade five 
samples there were no differences in mathematics; and differenceis in 
English (the control group obtaining, high^er scores) were found only 
for low ability students, item' discrimination indices obtained from 
the two forms did not show any significant differences. There w^s a 
negative linear relationship between percentage choosing the I dqn«:t 
know response and percentage correct. In general, lower ability . 
students used the response more often than those with higher, ability 
except in the case of the factitious items. Sex differences were also 
,found to be a factor in the us€i of the I don't know response. ' 
(Authcr/MV) 
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: ' ' IN MULTIPLE-CHOICE TESTS 
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Mult Lple-^choice standardized aciiievement tests of English 
vocabulary and reading compreiiension and of matliematics v;ere ad- 
ministered to samples of 592 grade eight students and 615 gr^de five 
students. Two forms of each test unit v;ere prepared. The control 
groups took foruiD containing items with four responses , while., the 
experimental groups took forms which had an additional response of 
I don't know. A few fictitious .vocabulary items having no right 
answers were included in eacl|/of the English test units. 

It was found that ,'in>grade eight tests the mean scores of the ^ 
.control groups were hi'gher. For tlie /grade five sauples there were ^ 
no differences in na 1. lena tics ; and differences in lingli.sh' (the con- 
trol groui> ohtainini^ iiighcr scores) v.ere found only for lov7 ability 
-students," i t dirw i »jnination indices obtained from the tvro forms- 
did not S n;-' nny I/.i/if leant differences. There was a negative 
linear rti i ,/^;nsl.i^ ' .^tween percentage chobslng the 'J chn^t knrtd re- 
sponse anil irceri w correct. Some characteristics of students • ' 
ehobsing t\ \. I di\^b know response were identified. , • 
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, . ■ A STUDY OF THE I DON'T KNOW RESPONSE " ^, 
■ IN MULTIPLE-CHOICE TESTS 

Lai-Min Paul Lee and William E. Coffman ■ 
■ Although multiple-choice testing has been accepted and extensively 
used in determining achievement in schools and in making decisions 
about. college admission and hiring, one of the fundamental issues in 
thi'S-form- of te5::ting is the problctir-of- -guessing. The tendency to 
guests by^ examim;es appears to be not a stable trait, but rather pne 
that may vary with the age, sexr race, personality , or motivation of - ^ 
the examinees (Votaw, 1936;' Swineford, 1938, 1941; Grit ten & Johnson, 
1941; Sheriffs & Boomer, 1954; Slakter, 1967). Various researchers 
have, examined the effect on item difficulties of response alternatives 
designed ,tb discourage guessing" (Wesman & Bennett, 1946; Rimland, 1960; 
Williamson, 1967)! To take account of the guessing factor in multiple- 
choice testing, a number of correction formulae have been proposed 
(Guilford, 1936; Horst, 1933). Recently Gene Glass (Burton, 1972) '.^J 
adapted correction formula to situations in which examinees could 
eliminate mote than zero. but less than a-1 of the incprrect alternatives 
. There are not many reports in literature on the use of I don't 
[know as an alternative in multiple-^choice tests to reduce guessihc . " 
(The Secondary Schoo 1 ExamijgJ^ 



1972). -The Metropolitan Achievement Tests and. a large number of ■ ... 
/ National Assessment Exercises ■ include'".! don 't know as one of the options 
..Recent National" Assessment Science Exercises results (Sherman,. 1973) 
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indicated that the tendency to show I don know response increased 
with age,, while the tendency to choose incorrect alternatives de- 
creased with age. In general, females at. all ages gave I don^t know 

.responses to multiple-choice science exercises more often than males. 

■ I 

Blacks tended to use 'J don't know responses more often than the nation 
as a whole, even though the differences may be negligible at the three 
, younger age levels. 

The present study was designed to investigate the effects of an 
additional response of I don't- know to items in multiple-^choice stan- 
dardized achievement tests of English vocabulary and reading compre- 
hension and of mathematics in grades five and eight. ^ Two forms of 
each test unit were prepared. The forms containing items with four re- 
sponses were given to groups of subjects referred to as the control 
groups, the -experimental groups received the forms of test units con- 
taining identical items except with an additional I don't know response 

The following questions were posed and studied in this investiga- 
tion; 

1. Will the scores of the control groups on tests with four 

^ alternative responses.be higher than the scores of the. experimental 
groups on corresponding tests with an additional I don't J:Cnow response? 

2. Are there any particular items with four respon^es^ that dif f e 
significantly in item difficulty from the same items with an additional 
I don 't know response? 
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3. How would the items with an additional I don^t know response 
compare in item discrimination with the items withou^t this response? 

4. What is the relationship, if any, between, the percentage 
choosing the I don't know response and the percentage getting the it( 
correct in tests containing this additional response? 

5. What are the . characteristics of students who choose Che I 
don't know response on tests with this additional response? 

Procedure 

A total of 592 grade eighths tudents .^nd 615 grade five students • 
from eighteen schools in eleven school Histr^icts in the State of Iowa 
were included in the study. Four tests were constructed for each 
grade, two in English and ^two in mathematics.. The test units for the 
control groups contained items! with four responses. The test units 
for the experimental groups were identical excepr that each item con- 
taiiied a fifth response, I don't know. Items in these units were 
selected from various forms of Iowa Tests of Basic Skills (ITBS). 
Four fictitious vocabulary items with no correct answers were incli ded 
in each of the English tests. The testing time for each test was 20 
minutes.. The four tests units were distributed so that each successive 
student in the class received a different test, resulting in approxi- 
mately equivalent fourths of tfie students takiiig each test ainit.- All 



these students had taken the 1973 regular examination of ITBS; scores . 
• from the appropriate ITBS subtests were used as the criterion scores 



for analyzing the items in the experimental tests and for controlling: 
for random differences in the basic ability between experimental and 
control 'groups . 

,^ Analysis 
Item difficulty (p-rights) aiid discrimination indices'": (r-biserial) 
were calculated for each item for each of the four experimental tests 
using scores on the related ITBS subtest as the criterion. The tech- 
--rxique of analysis of covanriance, using the .related ITBS subtest -scores 
as the cou^ol, was used to investigate whether scores of t^he control 
group were higher than scores of the experimental/group. An arc sine 
transformation of the p-rights was per formed, ^ to compare the item diffi- 
culties of items of- the control group and the experimental group. A 
matched pair t-test was used to compare the r-biser-ials of items in the 
tests of the control group' and the experimental ' group . The technique 

of analysis of cpvariance was also used to compare .percentagas choosing 

■ . • ■ • ' ■• ■ ' , ^ 

the I- don't ^know response for the items in each of the four tests .con- 
taining this additioual response. In this case, the control variable 
was the percentages choosing the right answers for these tests. The 
characteristics of the group choosing the I don't know response was also 
analyzed by tabulating the percentage of students using this response 
at least once and the mean number of usage of this response by ability 



"level and by sex. ~ ■ . • \ ,.' 

■. *v. .Results , ■ ■■'■■v'^v-^ 

; As shown in Table 1, in the grade eight English and mathematics /^^. 
tests, the mean scores of the control group in tests" with four re- , 
.ponses were higher than the mean scores of the experimental group in ■i"v&^rl| 



Table 1 



Summary o£>Criterioii Scores and Test Scores 



(D 



Grade 8 Mathematics 



X Mean . X S. D. 



Y Mean T S. D. 



;;,Grade 5 English 

■ ' •Contrdl ' . 
■..Experimental 



110.5 
112.7 



28.7 
27.9 



15.3 
14.9 



4.7 
5.5 



Grade 5 Mathematics 

I; - Control 

Experimental 

'Grade 8: English 

Control 
■ • Experimental 



114..4 
115.2 



172.5 
168.9 



25.1 

25.3 



41.1 
37.2 



17;7" 
17.7 



18:7 
17.0 



'5.7 



6.0 
5.9 



I 
I 



Control •; 
Experimental 



178.2- 
172.9 



38.3 
35.6 



21.1 
19.3 



6.1 
6.1 



u .1 



i'Cy.' is criterion score 



liSis'tesf - score' 
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tests with a fifth additional response of I don't know. The. mean 
scores of the two groups were not different for the grade five mathe- 
matics tests. The regression lines of the fifth grade groups on the 
English tests had different slopes, with the poorer students tending 
to get lower test scores when the I don't know response was available. 

Generally, the item difficulty figures were consistent with the 
mean scores. There seemed to be no overall difference in p-rights in 
grade five English and mathematics tesns. As a whole, the items with 
f our: f esponse^^^^ grade eight EngTrsTT and inathematlcs tests ha^ higher 
p-rights thaa corresponding items with the I don't know response. 
For each comparison, 'when the average difference over all items is 
taken into account,: the remaining differences can be attributed to 
sampling error. 

Results of the matched pair t-tests of r-biserials between items 
of the 'Control group and the experimental group, as shown in Table 2, 
did not show any significant differences, even though the criterion 
tests did not include the I don't know response and thus were more 
like the tests taken by the control. groi_-p thah the tests taken by the 
experimental group. . 
• V There was a linear relationship between percentage choosing the 

' ■ \ . ... . ',-.:...■. ,, • ■ ■ ■ ■■' ■ \ ■ ■ ■ 

I don't: know response and percentage correct. The cbrrelation coef-. 
ficients. ranged from -.56 to -.78. The three regression lines .of 
•the grade five English, grade five mathematics and grade eight mathe- 
matics tests^with percentage choosing the I don't knoo response on 
percentage correct could be regarded as having a single regression 
slope (Figure .1). The slope of the grade eight English regression 
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Table 2 





Results of Matched Pair t-tests of r-bi-erial 
Experimental and Control Group Tests 


Between 








Mean r^-r^ 


S. E. rg-rc 


df 


t 


Grade p 


English 


.05 ' 


' .025 


26 


2.00 


Grade 5 


Mathematics 


.00 


.019 


27 - 


.00 


Grade 8 


English 


- .01 


, V028. . 


31 


.36 


Grade 8 


Mathematics 


.-..03 


.019 


35 


-1.58 
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Figure 1 

Regression Lines of Percentages Choosing I Z)p7: 't Uo\i Response 



oa P'RlRhtsJorjhe Four Experimental Groupi3 




line, however, was much steeper than the slope, of the other regres- 
sion lines. Results of the comparison oP regression showed that 
more students chose the I don't know response in mathematics tests 
than in English tests; and grade eight students used this response . 
more often than grade five students. ^ \ 

It was observed that most of the students used the I :i^t[t. knot 
response one or more times when given the opportunity. In general, 
lower ability groups used this response more^-often than Jiigher abil- 
itygroups, which is what one would expect if the -^^don't know re^ 
sponse were to reduce guessing.. As shown in- Table 3,Hmuch higher 
percentages of students, across all three ability groups, chose ther 
I don't know response in the fictitious items than in the other 
items, even the very-difficult iter^s. In the ficLi tious- items , 
higher ability students used the -i^ don't know response more often 
than lower ability students. T 

in grade eight tests, male students used the I don't know re- 
sponse more of than female students (Table 4). However-, grade 
eight female students had higher criterion scores, in both English 
and mathematics than male students. In. grade five tests, female 
students chose the I don't fenou response mdre-often than""male stu- 
dents, even though female, students had higher^criterion arnd test. 



scores than male. students. Female students used the I don't know 
response, :in- the fictitious English . items more oftqn than male stu- 
dents.'. This means that in both grades male s'tudents had a greater 
tendency to guess than 'female students.. 



Table 3 ^ 

hoosing I Don^t Know Response in Grade 5* and Grade 8 
English Test by Ability and P-Rights 



No. of 
Items 


Low 


Ability Group 
Medium 


High 


' 2 , 


?. 16 
10'. 20 


1.26 
2.00 


0 

2.00 


S 


14\12 
7.34 


, 3. 78 
4.08 


.76 
1.58 


. 4 ^ 


n.25 

6.33 


8. IS 
4,48 


3.35 
4.52; 


2 


: 16.. 48 
9.53 


5.06' 
8.13 


3.44 
1 1 1 n 

11 .lU 


6: : ■ ■ ■■; 
3 ■ ■ , 


J2.?5 6.12 
16.10 IX.lk 

'21.07 17: 02 
19 .^e 18.33 


3.35 

8.24 

8^.6S 
15.70 


•' 

5 


21.22 


30.60 


20.78 


k . ■ ■■ ^, . - . ■ 

• •'■ ' ' 0. 


47^,05 


S4. 75 


^ 39.40 


litems. .." , 4 . 

4 " 


29.40 
31.12 


43.40 
49.97 


44.23 
, 52.95 



ita in? italic^ 



fliiii 



wmm 



Table 4 



Average Number of I'don't j&igi<^Responses Used by^ Me and Female Students 



Grade ilnglish':, ; ;, ' 

' 'V,;^{2Iireal:;:ll:^ 
. 1,4: fictitious items 



Male • Female ■:' Total 



• 3.05,,;, 

:l.39;':;.:v;i.6:8r:'':;^ 



■;Grade 5, Mathematics 
' (Z8 item^) ' 



•1.86 , 2.82 2.32 



,Grade4 English 
(32 real items) 
( 4 fictitious items) 



i28 
U.78 



•2.8„6 
1.82 



3.57, 
1.79 



:.;;■^^^:/:iV,.:■■■,v■■' 



, Grade 8 Mathematics - 



3.38 ':";'2.69 , ■ 3.07 



^.•^^••■;Tv'::'v-> 



::->-^-'r'.;.- ^-.v v. :■ ■■' . ; Discussion 

^^^^^^^^^^v^^^-T^^^^ testing is a very/ 

. subtle problem. ' In the present study, the inclusioft of the I don't., 
know response appeared to be only, partially successful in reducing ,. 
guessing.. At the grade five level, the scores, of the two groUps ^ : 
were not-much different. At grade: ..eight , -the scqres of the control ^ 
filll group were higher than those of the. experimental. gr9uP- The latter 

Pfe?-^, finding is in agreement with Kriapp's study on mathematics .(196.8). 
ilSv - ' . Results' of matched pair t-test of r-biserials failed to show 

any significant .difference between tests of the. two groups. It 
should be reroembe^red, however, that the criterion tests used!^n:the . 
US •, it^ analysers were: multiE;i?e-choice tests, without I don't know re- 

m¥ sponsea. In- the present study, the ITBS scores were used as the. 

Hfc: • criterion scores in _the item anal^ysis' rather than the test scor^gs, ■ 
because t^ie' ITBS had . more items' in' the test, thus . it, would ptovide 
a more reliable criterion than the much sh6rter experimental tests. 
Furthermore, the, criterion' score would '.be independent of the items. 
As a check on the possibility, that differen<ies might be greater if 
the criterion had been total score of which the item was a part , .an 
' additional item analysis wks -Carried ouV. , The biserial correlations 
tended to be about ,10 •higher (reflecting the dependencies) , but , the 
average differences between experimental and control groups remained 
essentially the same; ... 



-kfe'f:'- 




, ' ■ ■ ■" ■ . . ^ . 

National Assessment Science exercises .0 Furthermore, the increase in 

tendency to use th^ i don ^t- know response with age is also found in 
Sherman's studies. It seems that older children are more aware of 
what they don't know and more willing to admit they don* t know the 
answers. However, if the I don't knoiO response was ,.hot provided, as 
in the tests taken by the control groups , childrei;! /at both age Revels 
in the present study just guessed, as demonstrated by the . low percent 
age o£ omits and not. r^aahed . 

It is not surprising to find that students of low ability uspd ., 
Xhe I don^t know response more often than students of high ability,^^^^^^ 
However, for the fictitious items, mote high ability students^ chose 
the I don^t know response. Using the number. of responses of I don't 
' fewzj in fictitious •items aS: an index of guessing, this negative cor- 
relatdbn of achieveraent with .the tendency to guess agrees with ^^^^-^ 
ter's- finding' a not with Swineford' s finding (1941) • 

shouJLd be .noU^d,. however,' .that Swineford (1941) showed that the male. 
, students had a higher tendency 'to guess than the female student^, a 
finding confirmed in this study. . , . ' : / 

It may be argued that since the multiple-choice tests contain 
a source of inaccuracy due to guessing, , recall, tests that require/ 
examinees -to supply the answer rather than to. choose the best answer 
from several alternatives should be used instead Howev.er .test's , 
witW recall items have their problems als.p; they are often ambiguous 
and always difficult to score objectively and speedily, p^rticalarly 
in large-scale testing programs. Though multiple-choice items may 



Lee 



- 14 - 



be contaminated by, guessing, they are just too useful to be discarded., 
The more promising approach would be more research, like the present 
study, to throw light on the nature of guessing in multiple-choice items 
and- CO generate procedures for reducing "this nuisance factor in 
multiple-choice tests. 
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